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Abstract. For large d, we study quantum channels on C d obtained by selecting randomly N inde- 
pendent Kraus operators according to a probability measure fi on the unitary group U(d). When fi 
is the Haar measure, we show that for N !>= d/s 2 , such a channel is e-randomizing with high proba- 
bility, which means that it maps every state within distance e/d (in operator norm) of the maximally 
mixed state. This slightly improves on a result by Hayden, Leung, Shor and Winter by optimizing 
their discretization argument. Moreover, for general £t, we obtain a e-randomizing channel provided 
N )p d(log d) e I e 2 . For d = 2 k (k qubits), this includes Kraus operators obtained by tensoring k 
random Pauli matrices. The proof uses recent results on empirical processes in Banach spaces. 



The completely randomizing quantum channel on C d maps every state to the maximally mixed 
state p*. This channel is used to construct perfect encryption systems (see [I] for formal definitions). 
However it is a complex object in the following sense: any Kraus decomposition must involve at least 
d 2 operators. It has been shown by Hayden, Leung, Shor and Winter [12] that this "ideal" channel 
can be efficiently emulated by lower-complexity channels, leading to approximate encryption systems. 
The key point is the existence of good approximations with much shorter Kraus decompositions. More 
precisely, say that a quantum channel <I> on C d is e-randomizing if for any state p, ||$(p) — p*||oo ^ e/d. 
The existence of e-randomizing channels with o(d 2 ) Kraus operators has several other implications [12], 
such as counterexamples to multiplicativity conjectures |17| . 

It has been proved in [12] that if (U) denote independent random matrices Haar-distributed on the 
unitary group U (d) , then the quantum channel 



is e-randomizing with high probability provided N ^ Cdlogd/e 2 for some constant C. The proof 
uses a discretization argument and the fact that the Haar measure satisfies subgaussian estimates. We 
show a simple trick that allows to drop a logd factor: <& is e-randomizing when N ^ Cd/e 2 , this is 
our theorem [TJ 

The Haar measure is a nice object from the theoretical point of view, but is often too compli- 
cated to implement for concrete situations. Let us say that a measure p on U (d) is isotropic when 
/ U p[/ 1 dfJ,(U) = p* for any state p. When d = 2 k , an example of isotropic measure is given by assigning 
equal masses at fc-wise tensor products of Pauli operators. 

The following question was asked in [12j : is the quantum channel <& defined as ([I]) e-randomizing 
when (Ui) are distributed according to any isotropic probability measure on U(d) ? We answer posi- 
tively this question when TV > Cdlog 6 d/s 2 . This is our main result and appears as theorem [21 Note 
that for non-Haar measures, previous results appearing in the literature [HI [21 [8] involved the weaker 
trace-norm approximation ||$(p) — p*||i £■ 
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As opposed to the Haar measure, the measure fi need not have subgaussian tails, and we need 
more sophisticated tools to prove theorem [51 We use recent results on suprema of empirical processes 
in Banach spaces. After early work by Rudelson [15] and Guedon-Rudelson [11], a general sharp 
inequality was obtained by Guedon, Mendelson, Pajor and Tomczak-Jaegermann [10]. This inequality 
is valid in any Banach space with a sufficiently regular equivalent norm, such as l\. The problem 
of e-randomizing channels involves the supremum of an empirical process in the trace-class space Sf 
(non-commutative analogue of if), which enters perfectly this setting. 

The paper is organized as follows. Section [JJ contains background and precise statements of the 
theorems. Theorem Q] (for Haar measure) is proved in section 03 Theorem [2j (for a general measure) is 
proved in section 31 An appendix contains the needed facts about geometry and probability in Banach 
spaces. 

Acknowledgement. I thank Andreas Winter for several e-mail exchanges on the topic, and I am 
very grateful to Alain Pajor for showing me that the results of [10] can be applied here. 

2. Background and presentation of results 

Thoughout the paper, the letter C and c denote absolute constants whose value may change from 
occurrence to occurrence. We usually do not pay too much attention to the value of these constants. 

2.1. Schatten classes. We write M.(C d ) for the space of complex d x d matrices. If A 6 M(C d ), 
let Si(A), . . . , Sd{A) denote the singular values of A (defined as the square roots of the eigenvalues of 
^4^4^). For 1 < p ^ oo, the Schatten p-norm is defined as 

\\A\\ p =(j2 Sl (A) 
\i=i 

For p = oo, the definition should be understood as ||j4||oo = maxs^(A) and coincides with the usual 
operator norm. It is well-known (see [5], section IV. 2) that (A4(C d ), || ■ \\ p ) is a complex normed space, 
denoted S d and called Schatten class. The space S d is the non-commutative analogue of the space £ d . 
We write B(S d ) for the unit ball of S d , 

The Schatten 2-norm (sometimes called Hilbert-Schmidt or Frobenius norm) is a Hilbert space 
norm associated to the inner product (A,B) = TrA^B. This Hermitian structure allows to identify 
M(C d ) with its dual space. Duality on Schatten norms holds as in the commutative case: if p and q 
are conjugate exponents (i.e. l/p + 1/q = 1), then the normed space dual to S d coincides with S d . 

2.2. Completely positive maps. We write M sa (C d ) (resp. M+(C d j) for the set of self-adjoint 
(resp. positive semi-definite) d x d matrices. A linear map $ : M(C d ) — > ^(C^) is said to preserve 
positivity if Q(A4+(C d )) C M+(C d ). Moreover, $ is said to be completely positive if for any k 6 N, 
the map 

$ ® ld M(ck) :M{C d ® C k ) -» M(C d <g> C k ) 

preserves positivity. We use freely the canonical identification M(C d ) <X> A4(C k ) w M(C d (£> C fc ). 

If (ei)o^isjd-i denotes the canonical basis of C d , let = \ei)(ej\. To <I> : M(C d ) — ► M(C d ) we 
associate A$ e M(C d <g> C d ) defined as 

d 

A$= E ij ®^(Eij). 
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The matrix A$> is called the Choi matrix of $ ; it is well-known [7] that $ is completely positive if 
and only if A$ is positive. Therefore, the set of completely positive operators on M(C d ) is in one- 
to-one correspondence with M. + (C d ® C d ). This correspondence is known as the Choi-Jamiotkowski 
isomorphism. 

The spectral decomposition of A$> implies now the following: any completely positive map <I> on 
M(C d ) can be decomposed as 

N 

(2) Z-.X^^ViXV?. 

Here Vi, . . . , Vjv are elements of M(C d ). This decomposition is called a Kraus decomposition of $ of 
length N . The minimal length of a Kraus decomposition of $ (called Kraus rank) is equal to the rank 
of the Choi matrix A$> . In particular it is always bounded by d 2 . 

2.3. States and the completely depolarizing channel. A state on C d is a element of A4+(C d ) 
with trace 1. We write V(C d ) for the set of states ; it is a compact convex set with (real) dimension 
d 2 — 1. If x £ C d is a unit vector, we write P x = \x){x\ for the associated rank one projector. The 
state P x is called a pure state, and it follows from spectral decomposition that any state is a convex 
combination of pure states. A central role is played by the maximally mixed state p* = Id/rf (/?* is 
sometimes called the random state). 

A quantum channel $ : M(C d ) — > M(C d ) is a completely positive map which preserves trace: 
for any X £ M(C d ), Tr $(A) = TrA". Note that a quantum channel maps states to states. The 
trace-preserving condition reads on the Kraus decomposition J2]) as 

JV 
i=l 

An example of quantum channel that plays a central role in quantum information theory is the 
(completely) randomizing channel (also called completely depolarizing channel) R : A4(C d ) — * M(C d ). 

R : X -> Tr A ■ ^. 

d 

The randomizing channel maps every state to p*. The Choi matrix of R is Ar = 2^c d ®c d - Since Ar 
has full rank, any Kraus decomposition of R must have length (at least) d 2 . An explicit decomposition 
can be written using Fourier-type unitary operators: let u = exp(2i7r/d) and A and B the matrices 
defined as 

(3) A(ej) = ej+i mo d d B(ej) = fJe 3 . 

For 1 ^ j, k ^ d, define Vj } k as the product B^A k . Note that Vj^ belongs to the unitary group U(d). 
A routine calculation (see also section |2~5|) shows that for any X £ A4(C d ), 

j,k=l 

This is a Kraus decomposition of the randomizing channel. 
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2.4. e-randomizing channels. We are interested in approximating the randomizing channel R by 
channels with low Kraus rank. Following Hayden, Leung, Shor and Winter [12] . a quantum channel $ 
is called e-randomizing if for any state p S T>(C d ), 

It is equivalent to say that the spectrum of $(p) is contained in [(1 — e)/d, (1 + e)/d] for any state p. It 
has been proved in [12] that there exist e-randomizing channels with Kraus rank equal to Cdlogd/e 2 
for some constant d. This is much smaller that d 2 (the Kraus rank of R). The construction is simple: 
generate independent random Kraus operators according to the Haar measure on U (d) and show that 
the induced quantum channel is e-randomizing with nonzero probability. A key step in the proof is 
a discretization argument. We show that a simple trick improves the efficiency of the argument from 
|12j to prove the following 

Theorem 1 (Haar-generated e-randomizing channels) . Let (Ui)i^i^N be independent random matrices 
Haar- distributed on the unitary group U(d). Let $ : C d — > C d be the quantum channel defined by 

1 N 

*M = nY, u *p u 1- 

Assume that < e < 1 and N ^ Cd/e 2 . Then the channel $ is e-randomizing with nonzero probability. 

As often with random constructions, we actually prove that the conclusion holds true with large 
probability: the probability of failure is exponentially small in d. 

It is clear that the way N depends on d is optimal: if <I> is a e-randomizing channel with e < 1, its 
Kraus rank must be at least d. This is because for any pure state P x , ${P X ) must have full rank. The 
dependence in e is sharp for channels as constructed here, since lemma [2] below is sharp. However, it 
is not clear whether families of e-randomizing channels with a better dependence in e can be found 
using a different construction, possibly partially deterministic. 

One checks (using the value c = 1/6 from [12] in lemma [3] and optimizing over the net size) that 
the constant in theorem Q] can the chosen to, e.g., C — 150. This is presumably far from optimal. 

2.5. Isotropic measures on unitary matrices. Although the quantum channels constructed in 
theorem Q] have minimal Kraus rank, it can be argued that Haar-distributed random matrices are hard 
to generate in real-life situations. We introduce a wide class of measures on U(d) that may replace the 
Haar measure. 

Definition. Say that a probability measure p on U{d) is isotropic if for any X 6 M(G d ), 

/ UXUUp{U) = TrX ■ —. 

JU{d) « 

Similarly, a U{d)-valued random vector is called isotropic if its law is isotropic. 

Lemma 1. Let U — {Uij) be aU{d)-valued random vector. The following assertions are equivalent 

(1) U is isotropic. 

(2) For any X e M(C d ), E| TrUX^\ 2 = ±\\X\\ 2 . 

(3) For any indices i,j,k,l, BUijUki = \5i,k&j,h 

Proof. Implications (3) (1) and (3) (2) are easily checked by expansion. For (1) (3), simply 
take X = |e_j)(efc|. Identity (2) implies after polarization that for any A, B £ Ai(C d ), 

r : r- . ,.1 1 . 
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from which (3) follows. □ 

Condition (3) of the lemma means that the covariance matrix of U — which is an element of 
M(M(C d )) — is a multiple of the identity matrix. 

Of course the Haar measure is isotropic. Other examples are provided by discrete measures. Let 
6 i/ = {Ui, . . . , U d -i} be a family of unitary matrices, which are mutually orthogonal in the following 
sense: if i ^ j, then Tr UjUj = 0. For example, one can take % = {-B J '-A fc }i<j,fc<d, A, B defined as ([3]). 
Then the uniform probability measure on is isotropic. Indeed, any X 6 M(C d ) can be decomposed 
as X = J2 x iUi and condition (2) of lemma[l]is easily checked. 

If we specialize to d = 2, we obtain a random Pauli operator: assign probability 1/4 to each of the 
following matrices to get a isotropic measure 

ao = ( o i ) ' ai = ( i o ) ' 05 = ( °i o ) ' 173 = ( o -°i ) ■ 

It is straightforward to check that isotropic vectors tensorize: if X\ € U(di) and Xi £ U(d,2) are 
isotropic, so is X\ ® X2 € U{did-z). If we work on M{{C 2 )® k ), which corresponds to a set of k qubits, 
a natural isotropic measure is therefore obtained by choosing independently a Pauli matrix on each 
qubit, i.e. assigning mass l/4 fe to the matrix cr^ ® • • • a ik for any i\, . . . , ik € {0, 1, 2, 3} fe . 

2.6. e-randomizing channels for an isotropic measure. We can now state our main theorem 
asserting that up to logarithmic terms, the Haar measure can be replaced in theorem 1 by simpler 
notions of randomness. We first state our result 

Theorem 2 (General e-randomizing channels). Let /1 be an isotropic measure on the unitary group 
U(d). Let (Ui)i^i^N be independent fi-distributed random matrices, and <I> : C d — > C d be the quantum 
channel defined as 

1 N 

(4) ®{p) = -Y,UipUl 

i=i 

Assume that < e < 1 and N ^ Cd(logrf) 6 /e 2 . Then the channel $ is e-randomizing with nonzero 
probability. 

Theorem [2] applies in particular for product of random Pauli matrices as described in the previous 
section. It is of interest for certain cryptographic applications to know that e-randomizing channels 
can be realized using Pauli matrices. 

As opposed to theorem [1] the conclusion of theorem [2] is not proved to hold with exponentially large 
probability. Applying the theorem with er\ instead of e and using Markov inequality shows that <I> is 
e-randomizing with probability larger than 1 — 77 provided N ^ Cdlog 6 dj (e 2 ?7 2 ). 

Theorem [2] could be quickly deduced from a theorem appearing in [10]. However, the proof of [10] 
is rather intricate and uses Talagrand's majorizing measures in a central way. We give here a proof of 
our theorem which uses the simpler Dudley integral instead, giving the same result. We however rely 
an a entropy lemma from [10], which appears as lemma [A5l in the appendix. 

The log 6 d appearing in theorem [2] is certainly non optimal (see remarks at the end of the paper) . 
However, some power of log d is needed, as shown by the next proposition. 

Proposition. Let A, B defined as Q and fi be the uniform measure on the set {B J A }i^,j t k^d' 
Consider (Xi) independent ^-distributed random unitary matrices. If the quantum channel $ defined 
as Q is h -randomizing with probability larger than 1/2, then N ~>t cdlogrf. 
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Proof. We will rely on the following standard result in elementary probability theory known as the 
coupon collector's problem (see [9], Chapter 1, example 5.10): if we choose independently and uniformly 
random elements among a set of d elements, the mean (and also the median) number of choices before 
getting all elements at least once is equivalent to d log d for large d. 

In our case, remember that uj — exp(2i7r/d) and for ^ j ^ d — 1, define 

1 ufl w 2j uV-W' 



,Vd Vd \fd Vd 

Note that = (xj)o^j^d-i is an orthonormal basis of C d and that B : >A k xo — Xj. Consequently, if 
U is ^(-distributed, the random state UP Xo W equals P Xj with probability 1/d. In the basis 3$, the 
matrix $(P Xo ) is diagonal. Note that if <£> is ^-randomizing, then Q(P Xo ) must have full rank. The 
reduction to the coupon collector's problem is now immediate. □ 

3. Proof of theorem [TJ Haar-distributed unitary operators. 

The scheme of the proof is similar to [12]. We need two lemmas from there. The first is a deviation 
inequality sometimes known as Bernstein's inequality. The second is proved by a volumetric argument. 

Lemma 2 (Lemma II. 3 in [12]). Let tp,ip be pure states on C d and (Ui)i^i^N be independent Haar- 
distributed random unitary matrices. Then for every < 6 < 1, 



( 1 N 1 s\ 



Lemma 3 (Lemma II. 4 in [12]). For < S < 1 there exists a set AT of pure states on C d with 
\Af\ ^ (5/S) 2d , such that for every pure state ip on C d , there exists tpo S J\f such that \\ip — <po\\i ^ S. 
Such a set Af is called a S-net. 

The improvement on the result of [12] will follow from the next lemma 

Lemma 4 (Computing norms on nets). Let A : B(C d ) -> B(C d ) be a Hermitian -preserving linear 
map. Let A be the quantity 

A= sup ||A(vO||oo= SU P |TrV>A(^)| 
Let < S < 1/2 and Af be a S-net as provided by lemma\M We can evaluate A as follows 

A sC — - — B, 

1-2(5 ' 



where 



B= sup |TrVoA(^ )| 



Proof of lemma^ First note that for any self-adjoint operators a, b 6 B(C d ), we have 
(5) |Tr6A(a)| sC A||a||i||6||i. 

By a convexity argument, the supremum in A can be restricted to pure states. Given pure states 
!p,i/j € T>(C d ), let (p ,ipo € Af so that \\ip - cp \\i ^ 6, , \\ip - Vo||i < <$. Then 

\Tr ipA(tp)\ < \Tr(i> - Vo)A(^)| + |Tr^ A(^ - p )| + |Tr^ A(<^ )| 

Using twice ([5]) and taking supremum over tp, ip gives A ^ S A + 6 A + B, hence the result. □ 
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Proof of the theorem. Let R be the randomizing channel. Fix a j-net Af with \Af\ < 20 2d , as provided 
by lemma [31 Let A = R — $ and A, B as in lemma [H Here A and B are random quantities and it 
follows from lemma [4] that 

Using the union bound and lemma [21 we get 

P ( B ^ 5d) ^ 2 ° M ' 2ex P(~ ce27V / 4 )- 

This is less that 1 if TV Cd/e 2 , for some constant C. □ 



4. Proof of theorem [2 general unitary operators. 

A Bernoulli random variable is a random variable e so that P(e = 1) = P(e = —1) = 1/2. Recall 
that C denotes an absolute constant whose value may change from occurrence to occurrence. We will 
derive theorem [2] from the following lemma. 

Lemma 5. Let U%, . . . ,Un S U(d) be deterministic unitary operators and let (si) be a sequence of 
independent Bernoulli random variables. Then 



(6) 



E £ sup 

P ev(c d ) 



N 



< C(log(I) 5/2 y/logN SUp 

P ev(c d ) 



N 



1/2 



Proof of theorem^ (assuming lemma{^). Let p be an isotropic measure on U{d) and (Ui) be indepen- 
dent /^-distributed random unitary matrices. Let M be the random quantity 



M 



sup 

p£X>(C d ) 



1 N 



i=l 



Id 
1 



We are going to show that EM is small. The first step is a standard symmetrization argument. Let 
(U-) be independent copies of (L 7 ,) and (e^) be a sequence of independent Bernoulli random variables. 
We explicit as a subscript the random variables with repsect to which expectation is taken 



EM < Eu,u' sup 
P ev(c d ) 

= Eu,w,e sup 

p£V(C d ) 

< 2E[/ ie sup 
P ev(c d ) 



^Y^UipUl-UlpU? 
1=1 

1 N 

1=1 

1 N 



The inequality of the first line is Jensen's inequality for , while the equality on the second line 
holds since the distribution of p i— > Ui P U\ — U[pU'^ is symmetric (as a .M(.M(C d ),.M(C d ))-valued 
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random vector). We then decouple the expectations using lemma [5] for fixed (Ui). 



EM < 



C 



(logd) 5/2 \/IogiVE sup 
N P ev(C) 



i=l 



1/2 



c 



n (log df' 2 yioiiVE^/ M + i 



c 



n (log d) 5 / 2 VlogiVy EM + i 



Using the elementary implication 

we find that EM < e/d provided N > Cd log 6 d/e 2 . 



□ 



It remains to prove lemma [5j We will use several standard concepts from geometry and probability 
in Banach spaces. All the relevant definitions and statements are postponed to the next section. 

Proof of lemma [H Let Z be the quantity appearing in the left-hand side of ([6]) . By a convexity 
argument, the supremum is attained for an extremal p, i.e. a pure state P x = \x)(x\ for some unit 
vector x. Since the operator norm itself can be written as a supremum over unit vectors, we get 



Z = sup 

\x\ = \y\ = l 



N 



sup 

|x| = |y| = l 



N 



^ sup 

AeB(Sf) 



N 



^EilltUiAl 1 



i=i 



The last inequality follows from the fact that B(Sf) = conv{|a;)(y|, \x\ = \y\ = 1}. Let $ : B(Sf) -> R N 
defined as 

$(A) = (\TtU 1 A\ 2 ,...,\TrU N A\ 2 ). 

We now apply Dudley's inequality (theorem IA2I in the next section) with K = $>(B(Sf)) to estimate 
EZ using covering numbers. This yields 



EZ 



C r y/logNWB(St)),\-\,e)de 



where | • | denotes the Euclidean norm on R N . Define a distance S on B(Sf) as 



S(A,B) = MA) - = (j2 |l TrC/ 4 A| 2 - | TrU^A 



1/2 



We are led to the estimate 



EZ < C 



logN(B(Sf),S,e)de. 



Using the inequality | \a\ 2 — \b\ 2 \ ^ \a — b\ ■ \a + b\, the metric S can be upped bounded as follows 

S(A,B) 2 < f Y^ITtUM + B)] 2 ) sup \TrU t (A-B)\ 2 - 
Let us introduce a new norm ||| • ||| on M(C d ) 



= sup \Tr UiA\. 
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Let 9 be the number equal to 

N 



N 

:= sup VlTrf/^l 2 ^ sup 

AeB(sf) i=1 P ev(C) 



i=l 

We get that for A, B e B(Sf), S(A,B) s£ 26\\\A - B\\\, and therefore 

yJlogN(B(S*),\\\-\\\,e)de. 



ez < ce 

It remains to bound this new entropy integral. We split it into three parts, for e to be determined. If 
e is large (e > 1), since ||E/i||oo = 1, we get that ||| • III ^ II ' 111- This means that N(B(Sf), \\\ ■ |||,e) = 1 
and the integrand is zero. If e is small (0 < e < eo), we use the volumetric argument of lemma [ATI 

N(B(S d ), HI • HI, er) < N(B(S d ), \\ ■ < (3/e) 2d \ 

In the intermediate range (eo ^ e ^ 1), let q = logd and p = 1 + l/(logd— 1) be the conjugate 
exponent. We are going to approximate the Schatten 1-norm by the Schatten p-norm. It is elementary 
to check that for A € M(C d ), \\A\\ q sC eP||oo. By dualizing 

\\A\U < e\\A\\ p => N(B(Sf), ||| • |||,e) < N(B(S d ), \\\ ■ \\\,e/e). 

We are now in position to apply lemma IA5I to the space E = S d . By theorems IA3I and IA4[ the 
2-convexity constant of Sr and the type 2 constant of S d (see next section for definitions) are bounded 
as follows 



T 2 (S d q ) < X(S d ) < ^q~T < v/lold. 
Since ||f/i|| g < e, the inequality given by lemma lA5l is 

C 



\ogN(B(S(),\\\ • |||,e) < -(logd) 3 /VlogiV- 
We now gather all the estimations 

/ A/log A^( J B(5' 1 d ), HI • |||,e)de < / V2cP\og(3/e)de + C(logd) 3/2 ^\ogN / -de. 

JO JO Jeo £ 

Choosing Co = 1/d, an immediate computation shows that 

J ^/\og N(B(S d ),\\\ ■ \\\,e)de < C(logd) 5 / 2 ^logiV- 
This concludes the proof of the lemma. □ 



Appendix : Geometry of Banach spaces 

In this last section, we gather several definitions and results from geometry and probability in 
Banach spaces. We denote by (E, \\ ■ |j) a real or complex Banach space (actually, in our applications 
E will be finite-dimensional). We denote by (E*, || • ||») the dual Banach space. 
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4.1. Covering numbers. 

Definition. If (K, S) is a compact metric space, the covering number or entropy number N(K, S, e) 
is defined to be the smallest cardinality M of a set {x±, . . . ,xm} C K so that 

M 

K<z\jB{xi,e) 

i=l 

where B{x,e) — {y S K s.t. 5(x,y) ^ e}. 

An especially important case is when if is a subset of R n and S is induced by a norm. The next 
lemma is proved by a volumetric argument (see [13], Lemma 9.5). 

Lemma Al. If || • | is a norm on R™ with unit ball K , then for every e > 0, N(K, \\ ■ ||, e) ^ (1 + 2/e) 71 . 

The following theorem gives upper bounds on Bernoulli averages involving covering numbers. For 
a proof, see Lemma 4.5 and Theorem 11.17 in |13j . 

Theorem A2 (Dudley's inequality). Let (ei) be independent Bernoulli random variables and K be 
a compact subset of R™. Denote by (xi, . . . ,x n ) the coordinates of a vector x e R". Then for some 
absolute constant C , 

EmaxVe.i, < C I ^\og N(K, \ ■ \,e)de 

where \ ■ | denotes the Euclidean norm on R" . 

4.2. 2-convexity. 

Definition. A Banach space (E, \\ ■ ||) is said to be 2-convex with constant A if for any y,z £ E, we 
have 




The smallest such A is called the 2-convexity constant of E and denoted by X(E), 

We say shortly that "E is 2-convex" while the usual terminology should be U E has a modulus of 
convexity of power type 2". This should not be confused with the notion of 2-convexity for Banach 
lattices [14] , 

It follows from the parallelogram identity that a Hilbert space is 2-convex with constant 1. Other 
examples are i v and Sp for 1 < p ^ 2. The next theorem has been proved by Ball, Carlen and Lieb 
|4j, refining on early work by Tomczak-Jaegermann |16| . 

Theorem A3. For p ^ 2, the following inequality holds for A,BG Ai(C d ) 

\\A\\l + {p-l)\\B\\l^ l -(\\A + B\\l + \\A-B\\l). 

Therefore, Sp 1 is 2-convex with constant l/\/p — 1- 

This property nicely dualizes. Indeed, it is easily checked (see 0], lemma 5) that E is 2-convex with 
constant A if any only if, for every y,z S E* , 

\\y\\l + ^\\4l>\(\\y + 4l + \\y-4l)- 

In this case, E* is said to be 2-smooth with constant A. 
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4.3. Type 2. 

Definition. A Banach space (E, || • ||) is said to have type 2 if there exists a constant T2 so that for 
any finite sequence y%, . . . , of vectors of E, we have 

1/2 / N \ V3 



(7) E 



N 
i=l 



2 



The smallest possible T2 is called the type 2 constant of E and denoted T2 (E) . Here, the expectation 
E is taken with respect to a sequence (e^ of independent Bernoulli random variables. 

It follows from the (generalized) parallelogram identity that a Hilbert space has type 2 with constant 
1, and there is actually equality in (7l) . If a Banach space E is 2-convex, then E* is 2-smooth. It is 
easily checked (by induction on the number of vectors involved) that a 2-smooth Banach space has type 
2 with the same constant. We therefore have the inequality T%(E*) ^ A(-E'). In particular, theorem 
I A3I implies the following result, first proved by Tomczak-Jaegermann [16] with a worse constant. 

Theorem A4. If q ^ 2, then has type 2 with the estimate 



T2(S d q ) < ^q~l. 

AA. An entropy lemma. The following lemma plays a key role in our proof. It appears as Lemma 
1 in [10]. 

Lemma A5. Let E be a Banach space with unit ball B(E). Assume that E is 2-convex with constant 
\{E). Let x\, . . . ,xn be elements of E* , and define a norm \\\ • \\\ on E as 

IIMII = max \xi{y)\. 
Then for any e > we have for some absolute constant C 



(8) ey/logN(B(E),\\\ ■ \\\,e) < C\(E) 2 T 2 (E*) 0ogiV max \\xi\\ E .. 

The proof of lemma IA5I is based on a duality argument for covering numbers coming from [6] . A 
positive answer to the duality conjecture for covering numbers (see [3J for a statement of the conjecture 
and recent results) would imply that the inequality J8]) is valid without the factor X(E) 2 . This would 
improve our estimate in theorem[2]to N ^ Cd(log d) 4 /e 2 . 
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